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\  ABSTRACT 

\  f  ^ * 

For  the  problem  of  individual  prediction  in  linear  regression 
models,  that  is,  estimation  of  a  linear  combination  of  regression 
coefficients,  mean  square  error  behavior  of  a  general  class  of 
adaptive  predictors  is  examined. 

1.  INTRODUCTION 

Suppose  the  usual  linear  regression  model  with  fixed  regres¬ 
sors,  Y  *  XB  +  e,  Y  .,  X  full  rank,  B  .  and  e  .  •v  (0,o2I). 
n*i  n*p 

*  T  -1  T 

Let  Bls  “(XX)  X  Y  denote  the  ordinary  least  squares  estimator 

of  6.  At  a  new  vector  of  predictor  values,  X.,  ve  seek  to  esti- 

T  .  ~U 

mate  X.6.  Using  mean  square  error  as  a  criterion,  results  of 

.0  T'  . 

Cohen  (1965)  show  that  if  c  is  normally  distributed,  “xqBls  is  an 

admissible  estimator  of  X^B  for  0  <  a  <  1,  e.g.,  the  UMVU  predic- 
tor  is  admissible.  In  fact,  a  predictor  of  the  form  £  Y  is  admis¬ 
sible  for  xJb  iff  (2£-X(XTX)"1X0)T(2£-X(XTX)"1X0  <  xJ(XTX)_1X0. 

In  the  sequel,  we  study  the  MSE  under  normality  of  predictors 
T* 

of  the  form  XqBc  where 

Bc  -  CBls  ♦  (I  -  C)B*  (1) 


1 


C  a  matrix  usually  data  dependent  and  6*  a  specified  vector.  Such 

A  * 

include  most  alternatives  to  discussed  in  the  literature. 
Earlier  work  in  this  direction  appears  in  Baranehik  (1964)  and 
Radhakrishnan  (1970). 


2.  NOTATION  AND  MOTIVATION 

at 

To  simplify  matters,  we  convert  to  canonical  form.  Let  a 

*  T  -1  T  -1 

*  P®LS*  P  orthogonal  such  that  P(X  X)  P  *  D  ,  D  diagonal  with 

diagonal  elements  d^.  Define  a  *  Pfi,  *  *  PXq  and  for  convenience 

set  6*  *  0.  For  the  moment  assume  c  known.  Our  problem  now  is  to 

estimate  6  *  £  a  given  a  N(a,o2D  ;  wishing  to  do  well  near  6*0. 

Let  U  *  £Ta,  Z  -  aTDa,  q  *  £Td“1£,  V  *  Z  -  U2/q,  X  *  aTDa  and 
2 

4  =  X  -  €  /q.  Then,  V,  V  are  independent,  U  K(6,o2q), 

V  ^  o2Xp_i^  /o2  )• 

Consider  a  general  adaptive  predictor  6(a)  of  the  form 

6(a)  -  Ih.(a)t.a..  (2) 

Most  predictors  of  6  discussed  in  the  literature  are  special 
cases  of  (2).  Apart  from  the  LS  predictor,  U,  we  have: 

i)  A  class  of  predictors  given  in  Thompson  (1968) 

U2  (a*£)2 

X  m  — - -  U,  m  a  known  constant,  i.e.,  h.(a)  *  — r -  . 

m  U  +ma2q  1  (a^t)  +mciq 

ii)  A  class  of  predictors  given  in  Mehta  and  Srivastava  (1971) 

MSb  b  "  ^1*b1e"b2L  /Cq)U,  0  <  bl  <  1,  b2  >  0,  b1,  b2  known, 


i.e.,  h^(a)  *  1  -  b^  exp(-b2(a*£)2/a2q) . 


iii)  A  predictor  arising  from  the  James-Stein  estimator  adapted 
for  unequal  variances  (Sclove  1968) 


JS  *  (1 - §-)U,  c  known  usually  taken  equal  to  p  -  2. 

c  z 

a 

A  positive  part  adjustment  should  be  applied  so  that  h.(a) 

*  (1  -  co2(aTDa)  *]+  . 

iv)  Predictors  arising  from  (simple)  ridge  estimators 


2 


di 

zlCTTir  “i 

i  t 


A 

where  kf  is  based  on  the  data,  i.e.,  h^(a)  *  d^/(d.+kt(a)). 
k’s  discussed  include: 


kj(a)  ■  e2p(aTa)  *  (Hoerl,  Kennard ,  and  Baldwin  1975), 
k^Ca)  *  o2pZ  *  (Lawless  and  Uang  1976), 

k-(a),  the  solution  to  Za?d?(d.  ♦  k,)  2  *  Za2  -  o2Zd.* 

3  l  i  i  3  l  i 

(McDonald  and  Galameau  1975), 

*  *2  -1  o 

k.(a),  the  solution  to  Za.d.(d.  ♦  k. )  *  ozp 

4  i  t  i  4 

(the  RIDGM  estimator  of  Dempster,  Schatzoff 
and  Vermuth  1977). 

A  subclass  of  (2)  which  includes  (i),  (ii),  (iii),  and  IL  has  the 

form  2 

6(a)  *  Zh.(U,Z)£.a.  .  (3) 

x  11 

A  further  subclass  which  still  includes  (i),  (ii),  and  (iii)  is 

6(a)  *  h(L’,Z)  •  U.  (4) 

When  D  *  I ,  all  of  the  aforementioned  estimators  belong  to  (4). 

Taking  another  point  of  view  (see  e.g.  Thompson  (1968)),  if 
Ik  in  (3)  is  constant,  the  optimal  h^  to  minimize  the  MSE  are 
easily  obtained: 


o2  ♦  X  t. 

i 


An  estimator  of  h.  would  be  of  the  form  c.(a,o‘)  leading  to  a  pre- 

1  .  * 

dictor  belonging  to  (2).  If  (5)  was  estimated  by  c(U,Z,o2)  • 
the  class  (4)  results. 

Suppose  we  take  a  Bayesian  approach  using  a  prior  which 
centers  6  at  0,  where  we  want  to  do  well.  More  precisely,  let  Q  be 
an  orthogonal  matrix  such  that  QD^a*  (v^*^)  where  nis(p~l)*l 
and  n  n  -  <(•.  If  we  take  as  our  prior 


3 


(6/^)  <v  N(0,(p  °  )),  p  known, 

p-1 

then  under  squared  error  loss,  the  Bayes  estimate  of  6  is 
(y  ♦  o2)  *  •  yV.  Since  (U,Z)  is  sufficient  under  the  marginal 
distribution  of  u  ■  QD^o  an  "empirical  Bayes"  estimator  of  6  takes 
the  form  in  (4). 

3.  EXAMINATION  OF  THE  MSE 

We  can  calculate  the  MSE  for  the  general  predictor  in  (2)  in 

-  1 

terms  of  the  Ik,  assuming  o*  known. 

3h . 

Theorem  1.  If  El'rTr'  *  o.l  <  *,  i  *  1,2 . . 

"  “■  du  1 

MSE(f)  *  c2q  +  E ( d  -  U)2  -  2c:EI£.2(l  -  h.) 

’  ii 

3h . 

♦  aiT  •  <6) 

Proof .  By  direct  calculation 

MSE(f)  -  02q  ♦  E< 6  -  U)2  -  2E{r(o)(U  -  6)}  (7) 

where  r(a)  ■  1(1  -  h.K.a..  Stein's  identity  (Stein  1981,  p.  1148) 
converts  the  right-most  term  of  (7)  to  c2qE(— ) .  Simplification 
yields  (6). 

rrr-  would  be  calculated  using  the  transformation  o  *  D  u.  of 
o  U 

the  previous  section.  In  the  case  of  (3),  it  can  be  calculated 
directly  writing  Ik  as  a  function  of  U  and  V.  For  predictors  of 
the  form  (4),  MSE(S)  depends  only  on  6  and  4>  and  is  given  as 
Corollary  1. 

Corollary  1.  For  the  predictors  in  (4),  if  E|U  — ]  <  * 

MSE(6)  «  o2q  ♦  E(1  -  h)2U2  +  2o2qEU  |jj  -  2c2qE(l  -  h).  (8) 

Under  (4)  choices  of  h  in  the  literature  are  such  that  h  is 
symmetric  in  U  about  0  and  restricted  to  [0,1].  Using  essentially 


4 


the  argument  of  Efron  and  Morris  (1976,  p.  14)  positive  part 
restriction  of  h  uniformly  reduces  risk.  Restriction  of  h  <  1  is 
less  clear.  Taking  h  >  0  the  predictor  h*  •  U  where  h*  •  min(h,l) 
does  not  necessarily  dominate  h  •  U.  For  example,  let 
2  2  2 

h(U,V)  •  {J  c*  *igeJJere  ^  •  Then  at  each  4> ,  for  1 6  |  suffi¬ 

ciently  large,  MSE  of  h(U,V)U  is  less  than  MSE  of  h*(U,V).  None¬ 
theless,  to  improve  in  a  neighborhood  of  a  specified  6^  requires 


convex  combinations  of  l'  and  6^.  Theorem  2  details  MSE  properties 
of  predictors  in  (4)  relative  to  the  MSE  of  U. 

Theorem  2 .  For  6(a)  in  (4)  with  h  c  [0,1],  let  h  be  symmetric 
in  U  about  0.  Let  g  *  (1  -  h)U  with  lim  sup  g  *  0  and  assume  -r® 

|u|-~  v  dL 


exists  for  all  U.  Finally,  assume  that  the  Lebesgue  measure  of 
A  *  { (l) ,V)  :  h(U,V)  <  1}  is  greater  than  0.  Then, 

(i)  For  each  <f  there  is  a  neighborhood  N  of  6  *  0  where 
MSE(£;6,e)  <  o2q. 

(ii)  MSE ( 6 ; 6 , 4 )  is  bounded  and  lim  MSE(£;6,$)  *  o2q. 

1  ®  |  * 00  f  '  P  ) 

(iii)  MSE ( £ ; 6 , $ )  is  symmetric  in  6  about  0  and  - ?  1 — 


0. 


e-o 


2 

(iv)  g  -  2  r*  changes  sign  at  least  once  in  0  <  U  <  «.  If 

2 

g  -  2  -r*  changes  sign  b  times  in  0  <  D  <  ®,  then  for  fixed  $  , 

cU 

MSE(d;6,C)  -  c~q  changes  sign  at  most  2b  times. 

Proof .  The  proof  of  (i)  is  clear  since  MSE(6;0,<})  <  o2q. 

For  (ii), 

MSE( 6 ; 6 , d )  *  o2q  ♦  Eg2  -  2E(U  -  6)g.  (9) 


Given  c,  a  u^  such  that  for  all  V,  U  >  u^  jg|  <  c  and  3  8^  >  0 
such  that  | e |  >  P( | U j  >  u^)  >  1  -  c.  Then  the  second  term 

and  the  third  term  (using  the  Cauchy-Schwarz  Inequality)  in  (9) 
can  be  made  arbitrarily  small  as  1 6 1  -*■  «.  It  is  clear  that  the 
r.h.s.  of  (9)  is  bounded,  (iii)  is  obvious.  The  first  part  of 
(iv)  follows  since  U  is  admissible.  The  second  part  follows  from 


5 


Che  sign  change  theorem  of  Karlin  (1957)  by  noting  that 
MSE (  6 ; 6  ,$  )  -  o2q  «  E(g2  -  2  -|g)  . 

Remark  1.  Predictors  in  (i),  (ii),  (iii)  of  Section  2  satisfy 
the  conditions  of  Theorem  2. 

Remark  2.  Result  (ii)  is  a  simple  case  of  the  "tail  mini- 
maxity"  notion  of  Berger  (1976). 

Remark  3.  In  (iii),  inf  MSE(£;6,$)  need  not  occur  at  e  ■  0. 

If,  however,  h(U,V)  is  increasing  in  |U|  it  must  as  may  be  shown 
by  establishing  the  result  for  h,  a  step  function  in  U.  An  induc¬ 
tion  argument  proves  this. 

Remark  U  .  If  b  *  1  in  (iv),  then  a  graph  of  MSE(d;6,$)  for 
f  ^  0  must  start  below  c-q  at  6  *  0,  cross  above  c‘q  at  some  6  and 
then  asymptotically  return  to  o2q  from  above.  Any  predictor  satis¬ 
fying  the  conditions  of  Theorem  2  must  necessarily  perform  worse 
for  a  set  of  e's  near  0  than  for  a  set  arbitrarily  far  away. 

Remark  5.  No  insnediate  extension  of  Theorem  2  to  6(a)  as  in 
(3)  is  available.  For  an  arbitrary  member  of  (3),  MSE  depends  upon 
6  and  n  and,  even  if  each  h^  meets  the  "tail  minimaxity"  condition, 
need  not  approach  o2q  as  1 6  |  -*■  ®  for  fixed  n. 

Remark  6.  Theorem  2  is  readily  extended  to  the  comparison  of 
any  pair  of  predictors  in  (4). 

We  conclude  with  a  coimnent  on  admissibility  for  the  above 
predictors.  Within  the  class  of  predictors  based  solely  on  U,  i.e., 
h(U)U,  those  meeting  the  conditions  of  Theorem  2  will  either  be 
admissible  or  if  not  then  improvement  cannot  be  substantial.  We 
employ  ideas  of  Chow  and  Hwang  (1984).  Suppose  6^(U)  is  to  dominate 
»  h(U)U  meeting  the  conditions  of  Theorem  2.  We  can  write  6j  as 
h*(U)U,  and  assume  h*  >_  0.  For  6^  to  dominate  6q  requires,  when 
I U |  is  large,  that  generally  h*  be  closer  to  1  than  h  and  that, 
triien  | U |  is  small,  generally  h*  be  closer  to  0  than  h.  A  simplified 
picture  of  6^,6^  for  U  >  0  might  look  like 
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6 


*<VU0 


But,  at  6  *  6g,  it  would  be  almost  impossible  for  f ^  to  domi¬ 
nate.  Thus,  the  simplest  h*  which  realistically  could  dominate 
would  have  to  have  at  least  3  sign  changes  for  h  -  h*  on  U  >  0. 

For  such  an  h*.  its  form  would  be  complicated,  domination  would  be 
difficult  to  show,  and  improvement  would  be  minimal. 

This  argument  does  not  extend  to  the  more  general  class  (A). 
Though  V  and  V  are  independent,  conditioning  on  V  in  the  above 
heuristic  leads  to  depending  upon  V.  We,  nonetheless,  conjec¬ 
ture  "approximate  admissibility"  for  members  of  (4)  meeting  the 
conditions  of  Theorem  2. 

FOOTNOTE 

1  -  .  2 
When  o'-  is  unknown,  we  customarily  assume  an  estimator  S  of 

-  2  -  ■> 

c*  such  that  vS  cAx“,  independent  of  a.  In  the  foregoing  pre- 
dictors,  o2  is  replaced  by  cS  .  As  Lawless  (1981,  pp.  463-464) 
notes,  when  v  -*■  •  and  even  when  v  is  moderate,  resulting  MSE  will 
differ  little  from  that  with  o2  known. 
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